AITopics | soft margin

In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap). In this work we propose value-based algorithms for offline RL with PAC guarantees under just partial coverage, specifically, coverage of just a single comparator policy, and realizability of soft (entropy-regularized) Q-function of the single policy and a related function defined as a saddle point of certain minimax optimization problem. This offers refined and generally more lax conditions for offline RL. We further show an analogous result for vanilla Q-functions under a soft margin condition. To attain these guarantees, we leverage novel minimax learning algorithms to accurately estimate soft or vanilla Q-functions with $L^2$-convergence guarantees. Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.

arxiv preprint arxiv, offline data, q-function, (13 more...)

arXiv.org Machine Learning

2302.02392

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > Strength High (0.46)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Boosting Algorithms for Maximizing the Soft Margin

Neural Information Processing SystemsApr-6-2023, 14:51:29 GMT

We present a novel boosting algorithm, called SoftBoost, designed for sets of bi- nary labeled examples that are not necessarily separable by convex combinations of base hypotheses. Our algorithm achieves robustness by capping the distribu- tions on the examples. Our update of the distribution is motivated by minimizing a relative entropy subject to the capping constraints and constraints on the edges of the obtained base hypotheses. The capping constraints imply a soft margin in the dual optimization problem. Our algorithm produces a convex combination of hypotheses whose soft margin is within δ of its maximum.

algorithm, maximizing, soft margin, (7 more...)

Neural Information Processing Systems

Country: Europe > Germany (0.09)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Chance constrained conic-segmentation support vector machine with uncertain data

Peng, Shen, Canessa, Gianpiero, Allen-Zhao, Zhihua

arXiv.org Artificial IntelligenceSep-22-2022

In classification problems, a classifier is a function that mimics the relationship between the data vectors and their class labels. Support vector machine(SVM) is a popular classifier, which was proposed by Cortes and Vapnik [1] as a maximum margin classifier. The success of the SVM has encouraged further research into extensions to the more general multiclass cases, which has been an active topic of research interest [2-4]. Shilton et al.[5] proposed the conicsegmentation support vector machine (CS-SVM) by introducing the concept of target space into the problem formulation and showed that some other multiclassfication model are special cases of this framework. The standard CS-SVM is dealing with the situation where the exact values of the data points are known.

artificial intelligence, conic-segmentation support vector machine, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2107.13319

Country:

North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Filters

Collaborating Authors

soft margin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

c94a589bdd47870b1d74b258d1ce3b33-Paper.pdf

2a095b46705d7e6f81fc50270fe770c2-Supplemental-Conference.pdf

Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

2a095b46705d7e6f81fc50270fe770c2-Supplemental-Conference.pdf

2a095b46705d7e6f81fc50270fe770c2-Paper-Conference.pdf

f5e62af885293cf4d511ceef31e61c80-Paper.pdf

Unfolding recurrence by Green's functions for optimized reservoir computing Sandra Nestler

Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

Boosting Algorithms for Maximizing the Soft Margin

Chance constrained conic-segmentation support vector machine with uncertain data